Skip to content

Fix inconsistency between grapheme_substr() and substr() #6163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

kocsismate
Copy link
Member

grapheme_substr("", 0, $length) should return an empty string no matter what integer value $length has. This hasn't been the case before, so let's make this behaviour consistent with substr().

A remaining inconcistency is that while grapheme_substr("abc", strlen("abc"), $length) throws an exception, substr("abc", strlen("abc"), $length) returns "", but substr("abc", strlen("abc") + 1, $length) returns false. It seems to me that substr() is even inconsistent with itself, but it's pretty much probable that I forgot about this change, and the reasoning behind this.

@kocsismate kocsismate changed the title Fix consistency between grapheme_substr() and substr() Fix inconsistency between grapheme_substr() and substr() Sep 18, 2020
@chschneider
Copy link
Contributor

The scope is much bigger than just grapheme_substr("", 0, ...);

Please revisit all the changes as there are lots of inconsistencies and backward compatibility breaks due to the new exceptions!

if ( OUTSIDE_STRING(lstart, str_len)) {
if (str_len == 0 && lstart == 0) {
RETURN_EMPTY_STRING();
} else if (OUTSIDE_STRING(lstart, str_len)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the real problem is in OUTSIDE_STRING, which should only consider lstart > str_len to be out-of-bounds, not lstart >= str_len. "One past the end" is generally always considered a valid offset (also in strpos etc).

Though this is just a "definitely wrong" check, as this is a grapheme cluster offset, the actual offset check happens later...

Copy link
Contributor

@chschneider chschneider Sep 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think even lstart > str_len should be allowed for grapheme_substr() (and probably other functions). Whether it should return "" or false in that case is up for debate but throwing an exception IMHO cripples the API and makes it inconsistent with substr/mb_substr.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into this tonight

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've applied a fix for the main problem in 1312c41 (there are still issues for the non-ASCII case). For the grapheme_substr() case, I would first like to decide on a final behavior for substr(), before we adjust this one (#6182).

@nikic
Copy link
Member

nikic commented Sep 23, 2020

Closing this in favor of #6182, which now also implements the update to grapheme_substr().

@nikic nikic closed this Sep 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants